Skip to content

3622: [C] Fix negative length validation in binary decoding#27

Open
martin-augment wants to merge 1 commit intomainfrom
pr-3622-2026-01-13-20-33-57
Open

3622: [C] Fix negative length validation in binary decoding#27
martin-augment wants to merge 1 commit intomainfrom
pr-3622-2026-01-13-20-33-57

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3622: To review by AI

The read_bytes() and read_string() functions in encoding_binary.c
decode length values using zigzag encoding, which can produce negative
numbers from malicious input. These negative values were passed directly
to avro_malloc(), causing allocation failures or undefined behavior.

This patch adds validation to reject negative length values with a
clear error message before attempting memory allocation.

Bug: Negative length values from varint decoding cause
     allocation-size-too-big when cast to size_t

Impact: DoS via crafted binary input

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions bot added the C label Jan 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 13, 2026

Walkthrough

The changes add validation checks to prevent negative length values from being processed in the encoding binary module. Specifically, the read_bytes and read_string functions now validate the parsed length immediately after reading it. If a negative length is detected, an error status (Invalid bytes/string length) is set and EINVAL is returned, preventing subsequent buffer allocations and processing operations.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a739f8 and a7ff134.

📒 Files selected for processing (1)
  • lang/c/src/encoding_binary.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (2)
lang/c/src/encoding_binary.c (2)

130-133: LGTM! Correctly validates negative length before allocation.

The validation properly rejects negative lengths decoded from malicious varint input, preventing the undefined behavior that would occur when casting a negative int64_t to size_t for avro_malloc().


182-185: LGTM! Consistent validation with read_bytes.

The check correctly guards against negative string lengths before buffer allocation.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Jan 13, 2026

Code Review - PR #27: [C] Fix negative length validation in binary decoding

Summary

This PR addresses a critical security vulnerability where negative length values from zigzag-decoded varints could be passed to avro_malloc(), causing allocation failures or undefined behavior. The fix adds validation to reject negative lengths before memory allocation.

✅ Strengths

Security Fix - Critical

  • Correctly addresses a DoS vulnerability where malicious input could crash the application
  • Prevents undefined behavior from casting negative int64_t to size_t in malloc calls
  • Clear and appropriate error messages using PRId64 format specifier

Code Quality

  • Consistent implementation across both read_bytes() and read_string()
  • Follows existing error handling patterns in the codebase
  • Uses appropriate error code (EINVAL) for invalid input
  • Proper formatting and style matching the surrounding code

⚠️ Issues & Recommendations

1. Critical: Missing validation in skip_bytes() (encoding_binary.c:144-152)

The skip_bytes() function has the same vulnerability but was not patched:

static int skip_bytes(avro_reader_t reader)
{
    int64_t len = 0;
    int rval;
    check_prefix(rval, read_long(reader, &len),
                 "Cannot read bytes length: ");
    AVRO_SKIP(reader, len);  // ⚠️ No validation of negative len!
    return 0;
}

Impact: An attacker can still trigger undefined behavior via the skip path.

Recommendation: Add the same negative length check to skip_bytes():

if (len < 0) {
    avro_set_error("Invalid bytes length: %" PRId64, len);
    return EINVAL;
}

2. Missing: Test Coverage

No tests were added to verify the fix. This security vulnerability should have regression tests.

Recommendation: Add tests that:

  • Verify negative length values are rejected with proper error messages
  • Test boundary conditions (0, -1, INT64_MIN)
  • Ensure the fix works for both read_bytes() and read_string()
  • Confirm no memory leaks on error paths

Example test structure:

// Test negative length in read_bytes
avro_reader_t reader = create_reader_with_negative_varint();
char *bytes = NULL;
int64_t len = 0;
int result = read_bytes(reader, &bytes, &len);
assert(result == EINVAL);
assert(bytes == NULL);

3. Consideration: Integer Overflow Protection

While the negative check is good, extremely large positive values could still cause issues:

  • Casting very large int64_t to size_t on 32-bit systems
  • Allocation of unreasonably large buffers (e.g., INT64_MAX bytes)

Recommendation (Future Enhancement): Consider adding an upper bound check:

#define MAX_REASONABLE_LENGTH (1024 * 1024 * 1024)  // 1GB or appropriate limit

if (*len < 0 || *len > MAX_REASONABLE_LENGTH) {
    avro_set_error("Invalid bytes length: %" PRId64, *len);
    return EINVAL;
}

4. Minor: Code Location

The validation is placed after check_prefix but before allocation. This is correct, but consider documenting why this ordering is important (to avoid duplicate error messages).

🔒 Security Assessment

Severity: High (DoS vulnerability)
Attack Vector: Crafted Avro binary input with negative varint lengths
Mitigation: Properly validates input before allocation
Residual Risk: skip_bytes() still vulnerable

📊 Performance Considerations

The added validation has negligible performance impact:

  • Single integer comparison per read operation
  • No additional memory allocations
  • Minimal CPU overhead

✅ Best Practices Compliance

  • ✅ Error handling follows existing patterns (avro_set_error, return EINVAL)
  • ✅ Uses portable format specifiers (PRId64)
  • ✅ Consistent style with surrounding code
  • ✅ Clear commit message with security context
  • ❌ Missing test coverage
  • ❌ Incomplete fix (skip_bytes not patched)

🎯 Recommendation

Do not merge yet. Please address:

  1. Critical: Add negative length validation to skip_bytes() (same pattern as this PR)
  2. Important: Add regression tests for negative length handling
  3. Consider: Add upper bound validation for defense in depth

Once these issues are addressed, this will be a solid security fix ready for merge.


🤖 Generated with Claude Code

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Jan 13, 2026

🤖 Augment PR Summary

Summary: Adds explicit validation to reject negative lengths while decoding Avro binary bytes and string values.

Technical Notes: On malformed negative lengths, decoding now sets an error message and fails fast with EINVAL to avoid invalid allocations.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

int rval;
check_prefix(rval, read_long(reader, len),
"Cannot read bytes length: ");
if (*len < 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see negative-length validation added for read_bytes/read_string. Note that skip_bytes (and thus skip_string) still accepts a decoded negative length and passes it to AVRO_SKIP, which could behave badly on malformed input.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:security; feedback:The Augment AI reviewer is correct! The same kind of problem as the fixed ones could be seen in the skip_bytes() method. It should be fixed the same way as the others. Prevents a security attack by crafting an Avro file with negative length for a field with a Bytes schema.

avro_set_error("Invalid bytes length: %" PRId64, *len);
return EINVAL;
}
*bytes = (char *) avro_malloc(*len + 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avro_malloc is a thin wrapper over realloc and doesn’t guard size overflows; if *len is extremely large, *len + 1 can overflow (UB) and/or truncate when converted to size_t. Consider adding an upper-bound check (also applies to read_string’s str_len + 1) before computing the allocation size.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:security; feedback:The Augment AI reviewer is correct! A specially crafted Avro file with a really big length value will lead to trying to allocate a lot of memory and possible crash due to out-of-memory error.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lang/c/src/encoding_binary.c (1)

144-152: Missing negative length validation in skip_bytes.

The same malicious varint input that could produce negative lengths in read_bytes can also affect skip_bytes. Here, len is passed directly to AVRO_SKIP without validation, which could cause undefined behavior depending on how the macro handles negative values.

For consistency and completeness of the security fix, consider adding the same validation:

Proposed fix
 static int skip_bytes(avro_reader_t reader)
 {
 	int64_t len = 0;
 	int rval;
 	check_prefix(rval, read_long(reader, &len),
 		     "Cannot read bytes length: ");
+	if (len < 0) {
+		avro_set_error("Invalid bytes length: %" PRId64, len);
+		return EINVAL;
+	}
 	AVRO_SKIP(reader, len);
 	return 0;
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a739f8 and a7ff134.

📒 Files selected for processing (1)
  • lang/c/src/encoding_binary.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (2)
lang/c/src/encoding_binary.c (2)

130-133: LGTM! Correctly validates negative length before allocation.

The validation properly rejects negative lengths decoded from malicious varint input, preventing the undefined behavior that would occur when casting a negative int64_t to size_t for avro_malloc().


182-185: LGTM! Consistent validation with read_bytes.

The check correctly guards against negative string lengths before buffer allocation.

@martin-augment
Copy link
Copy Markdown
Owner Author

1. Critical: Missing validation in skip_bytes() (encoding_binary.c:144-152)

The skip_bytes() function has the same vulnerability but was not patched:

static int skip_bytes(avro_reader_t reader)
{
    int64_t len = 0;
    int rval;
    check_prefix(rval, read_long(reader, &len),
                 "Cannot read bytes length: ");
    AVRO_SKIP(reader, len);  // ⚠️ No validation of negative len!
    return 0;
}

Impact: An attacker can still trigger undefined behavior via the skip path.

Recommendation: Add the same negative length check to skip_bytes():

if (len < 0) {
    avro_set_error("Invalid bytes length: %" PRId64, len);
    return EINVAL;
}

value:useful; category:security; feedback:The Claude AI reviewer is correct! The same kind of problem as the fixed ones could be seen in the skip_bytes() method. It should be fixed the same way as the others. Prevents a security attack by crafting an Avro file with negative length for a field with a Bytes schema.

@martin-augment
Copy link
Copy Markdown
Owner Author

3. Consideration: Integer Overflow Protection

While the negative check is good, extremely large positive values could still cause issues:

  • Casting very large int64_t to size_t on 32-bit systems
  • Allocation of unreasonably large buffers (e.g., INT64_MAX bytes)

Recommendation (Future Enhancement): Consider adding an upper bound check:

#define MAX_REASONABLE_LENGTH (1024 * 1024 * 1024)  // 1GB or appropriate limit

if (*len < 0 || *len > MAX_REASONABLE_LENGTH) {
    avro_set_error("Invalid bytes length: %" PRId64, *len);
    return EINVAL;
}

value:good-to-have; category:security; feedback:The Claude AI reviewer is correct! A specially crafted Avro file with a really big length value will lead to trying to allocate a lot of memory and possible crash due to out-of-memory error.

@martin-augment
Copy link
Copy Markdown
Owner Author

144-152: Missing negative length validation in skip_bytes.

The same malicious varint input that could produce negative lengths in read_bytes can also affect skip_bytes. Here, len is passed directly to AVRO_SKIP without validation, which could cause undefined behavior depending on how the macro handles negative values.

value:useful; category:security; feedback:The CodeRabbit AI reviewer is correct! The same kind of problem as the fixed ones could be seen in the skip_bytes() method. It should be fixed the same way as the others. Prevents a security attack by crafting an Avro file with negative length for a field with a Bytes schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants